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DETAILED ACTION 
Response to Arguments 

Applicant's arguments with respect to claims 1-5, 11, 13-29 have been 
considered but are moot in view of the new ground(s) of rejection. 

Claim Objections 

Claims 1-5, 11, 13-29 are objected to because of the following informalities: 
Independent claims 1,19 and 29 have been amended to recite "wherein said video 
linking system samples said video content at a sample rate which is a multiple of plural 
standard playback rates." Nonetheless based on applicant's disclosure (See 
paragraphs 52-53), it appears that applicant intends the sampling rate to be a divisor of 
plural standard playback rates or standard playback rates to be a multiple of the 
sampling rate (since specification discloses a sample rate of three frames per second 
relative to standard playback rates of 30, 15, etc). Thus, this will be the interpretation 
utilized by the examiner in the subsequent rejection. Nonetheless appropriate 
correction is required concerning the current claim language. 

Claim Rejections - 35 USC § 103 

The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 1 02 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 
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Claims 1-3, 11, 13-17, 19-27 and 29 are rejected under 35 U.S.C. 103(a) as 
being unpatentable over Rangan (6198833) in view of Feinleib (6637032) in further view 
of Courtney (6424370) 

Regarding claims 1 and 19, Rangan teaches an image processing system for 
processing video content in a sequence of video frames and linking one or more pixel 
objects embedded in said video content to selected data objects in a sequence of video 
frames by explaining a system is provided for tracking a moving entity in a video 
presentation, the system comprising a computer station presenting the video 
presentation on a display as a series of bitmapped frames; and a tracking module 
receiving the video data stream. (Col 3, lines 26-29); said image processing system 
comprising a video capture system for capturing a frame of said sequence of video 
frames to be viewed defining a captured video frame by showing a recording function 
for accepting the positions wherein the pixel signature (defined in the art as a local 
neighborhood around given pixel) most closely matches the image signature as the true 
positions of the image entity in the next frames. (Col 3, lines 43-46) and in FIG. 1 input 
data stream 15 to tracking module 13 is a stream of successive bitmapped frames in a 
normalized resolution, required by the tracking module. (Col 5, lines 35-37) The 
authoring station can be based on virtually any sort of computer platform and operating 
system, and in a preferred embodiment, a PC station running MS Windows is used, in 
which case the input stream 16, regardless of protocol, is converted to a digital video 
format that can be interpreted and played back as a sequence of bitmapped frames. 
(Col 5, lines 37-43) Furthermore Rangan teaches a user interface for enabling a user 
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to select one or more pixel objects in said captured frame defining selected pixel 
objects. (Col 4 lines 1 1-35). Additionally Rangan teaches a pixel object tracking 
system, which includes a processor, which automatically tracks, said selected pixel 
objects in other frames. (Col 3, lines 26-50). It should be noted that it is well known in 
the art that a computer system would inherently contain a processor. Rangan also 
teaches said video linking system generating one or more linked video files, separate 
from said video content (Col 6 lines 48-51, Col 10 lines 53-66) by explaining when 
tracking element 29 (Fig. 2) is positioned and activated over an image entity to be 
tracked, a signature table is created and stored (Col 8, lines 40-42) and upon tracking 
element 29 being activated the tracking module creates a table or list comprising pixel 
values associated with a target number and spatial arrangement of pixels associated 
with tracking element 29. (Col 7, lines 40-43). Although Rangan does not explicitly state 
the generation of video files, it is inherent to the invention that a table or list, which is 
created by the tracking module and subsequently stored, must implicitly require files for 
storage function. Lastly, Rangan teaches through additional editing processes, a 
moving region associated with the image entity in a display may be made to be 
interactive and identifiable to an end user. (Col 6, lines 55-57). Rangan further teaches 
user interaction with such an image entity during viewing of a video can be programmed 
to provide additional network-stored information about that a entity to suitable customer 
premises equipment (CPE) adapted to receive and display that information (Col 6, lines 
57-62) and such further information may be displayed, for example, as an overlay on 
the display of the dynamic video containing the subject image entity. (Col 6, lines 62- 
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64) It should be noted Rangan further teaches providing one or more links to 
predetermined data objects for each pixel object. (Col 7 lines 25-52, Fig. 2) 
Nonetheless, Rangan fails to explicitly teach said video linking system generating one 
or more linked video files separate from said video content, being configured to identify 
the pixel objects by frame number and location within the frame. It would have been 
obvious to one of ordinary skill in the art at the time the invention was made to utilize 
user editing processes and programmable capabilities of stored information about an 
image entity, as taught by Reagan, to identify the pixel objects by frame number and 
location within a frame because it is well known in the art that stored information about 
an image entity will include information about the image object's frame number and 
location within the frame in order to properly retrieve and display that information. 
Furthermore, these user programmable abilities allow advertisers, product promoters, or 
the like to present information to end users based on user interaction with an associate 
entity in a dynamic video display. (Col 6, lines 64-67) Reagan also teaches linked video 
files are synchronized with said video content. (Col 6, lines 48-51 and Col 10, lines 53- 
56) Furthermore Reagan teaches wherein said linked video files are configured so that 
selected locations in said video frames by a pointing device during playback of the video 
content can be linked with said data objects when said selected locations correspond 
said pixel objects. (Col 7 lines 35-52) It should be noted that the point device as taught 
by Reagan is a mouse. However, Reagan does not explicitly teach information not 
embedded in video content. This is what Feinleib teaches. (Col 3 lines 51-65, Col 9 
lines 27-39, Col 11 lines 17-27) It should be noted that Feinleib teaches linked videos 
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files for enhancing content separate from and not embedded in video content by 
teaching enhancing content may reside in a viewers home and is synchronized by a 
closed caption script of the primary content with the synchronization independent of 
how and when the enchancing content or primary content is delivered to the viewer 
computing units. (Col 6 lines 23-30) Again, Feinleib explicitly teaches enhancing 
content can be delivered independently of the primary content and synchronized at 
the viewer-computing unit using the closed captioning script, which accompanies the 
primary content. (Col 9 lines 30-40). It would have been obvious to one of ordinary skill 
in the art at the time the invention was made to combine the teachings of generating 
one or more linked video files separate from and not embedded in video content into the 
system of Reagan because enhancements to primary content can be timely introduced 
at desired junctures of the primary content. (Col 2 lines 14-20) However neither 
Rangan nor Feinleib explicitly teaches a video linking system which samples video 
content at a sample rate which is a divisor of plural standard playback rates. This is 
what Courtney teaches (Col 15 line 28-Col 16 line 24, Fig. 24, Col 16 line 51-Col 7 line 
40) It should be noted that Courtney teaches 315 frames captured at approximately 3 
frames per second (Col 16 lines 8-25) and it is well known in the art to utilize standard 
playback rates such as NTSC at 30 FPS and FPS at 12 FPS. It would have been 
obvious to one of ordinary skill in the art at the time the invention was made to combine 
the teachings of sampling video content at 3 frames per second (a divisor of a plurality 
of standard playback rates) into the combination of Rangan and Feinleib because 
testing the quality of motion based event detection of differing frame rates (such as 3 
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frames per second) can be achieved and thus, providing more intelligent feedback 
regarding the occurrence of complex object actions such as inventory theft (Col 2 lines 
59-63) can be realized. 

Claim 29 is similar in scope to claim 1 except for the recitation of clustering the 
sampled video content with plural frames per cluster. Courtney also teaches this (Col 
15 line 28-Col 16 line 24, Fig. 24, Col 16 line 51-Col 7 line 40). For example, it should 
be noted that Courtney teaches a 315 frame cluster captured at 3 frames per second. It 
would have been obvious to one of ordinary skill in the art at the time the invention was 
made to combine the teachings a 315 frame cluster sampled at 3 frames per second 
into the combination of Rangan and Feinleib because testing the quality of motion 
based event detection of differing frame rates (such as 3 frames per second) can be 
achieved and thus, providing more intelligent feedback regarding the occurrence of 
complex object actions such as inventory theft (Col 2 lines 59-63) can be realized. 

Regarding claims 17 and 27, Courtney teaches clustering the sampled video 
content with plural frames per cluster. (Col 15 line 28-Col 16 line 24, Fig. 24, Col 16 line 
51-Col 7 line 40). For example, it should be noted that Courtney teaches a 315 frame 
cluster captured at 3 frames per second. It would have been obvious to one of ordinary 
skill in the art at the time the invention was made to combine the teachings a 315 frame 
cluster sampled at 3 frames per second into the combination of Rangan and Feinleib 
because testing the quality of motion based event detection of differing frame rates 
(such as 3 frames per second) can be achieved and thus, providing more intelligent 
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feedback regarding the occurrence of complex object actions such as inventory theft 
(Col 2 lines 59-63) can be realized. 

Regarding claims 2 and 20, Courtney teaches sampling said video content at a 
sample rate of a divisor of 30 frames per second and 12 frames per second. (Col 15 
line 28-Col 16 line 24, Fig. 24, Col 16 line 51-Col 7 line 40) It should be noted that 
Courtney teaches 315 frames captured at approximately 3 frames per second (Col 16 
lines 8-25) It would have been obvious to one of ordinary skill in the art at the time the 
invention was made to combine the teachings of sampling video content at 3 frames per 
second (a divisor of a plurality of standard playback rates) into the combination of 
Rangan and Feinleib because testing the quality of motion based event detection of 
differing frame rates (such as 3 frames per second) can be achieved and thus, providing 
more intelligent feedback regarding the occurrence of complex object actions such as 
inventory theft (Col 2 lines 59-63) can be realized. 

Regarding claims 3 and 21, Courtney teaches a sample rate of at least 3 frames 
per second. (Col 15 line 28-Col 16 line 24, Fig. 24, Col 16 line 51-Col 7 line 40) It should 
be noted that Courtney teaches 315 frames captured at approximately 3 frames per 
second (Col 16 lines 8-25) It would have been obvious to one of ordinary skill in the art 
at the time the invention was made to combine the teachings of sampling video content 
at 3 frames per second (a divisor of a plurality of standard playback rates) into the 
combination of Rangan and Feinleib because testing the quality of motion based event 
detection of differing frame rates (such as 3 frames per second) can be achieved and 
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thus, providing more intelligent feedback regarding the occurrence of complex object 
actions such as inventory theft (Col 2 lines 59-63) can be realized. 

Regarding claims 13, 16 and 23 and 26, Courtney teaches sampling said video 
content at a sample rate of a multiple of NTSC and PAL (movie) frame rates. (Col 15 
line 28-Col 16 line 24, Fig. 24, Col 16 line 51 -Col 7 line 40) It should be noted that 
Courtney teaches 315 frames captured at approximately 3 frames per second (Col 16 
lines 8-25) and it is well known in the art to utilize standard playback rates such as 
NTSC at 30 FPS and FPS at 12 FPS. It would have been obvious to one of ordinary 
skill in the art at the time the invention was made to combine the teachings of sampling 
video content at 3 frames per second (a divisor of a plurality of standard playback rates) 
into the combination of Rangan and Feinleib because testing the quality of motion 
based event detection of differing frame rates (such as 3 frames per second) can be 
achieved and thus, providing more intelligent feedback regarding the occurrence of 
complex object actions such as inventory theft (Col 2 lines 59-63) can be realized. 

Regarding claims 14-15, 24-25 Courtney teaches sampling video content at a 
sample rate of a multiple of NTSC, PAL, 15 FPS and 12 FPS frame rates. (Col 15 line 
28-Col 16 line 24, Fig. 24, Col 16 line 51 -Col 7 line 40) It should be noted that Courtney 
teaches 315 frames captured at approximately 3 frames per second (Col 16 lines 8-25) 
and it is well known in the art to utilize standard playback rates such as NTSC at 30 
FPS and FPS at 12 FPS. It would have been obvious to one of ordinary skill in the art 
at the time the invention was made to combine the teachings of sampling video content 
at 3 frames per second (a divisor of a plurality of standard playback rates) into the 
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combination of Rangan and Feinleib because testing the quality of motion based event 
detection of differing frame rates (such as 3 frames per second) can be achieved and 
thus, providing more intelligent feedback regarding the occurrence of complex object 
actions such as inventory theft (Col 2 lines 59-63) can be realized. 

Regarding claims 1 1 and 22, Rangan teaches including a video playback 
application for playing back video content and said linked video files, wherein said video 
playback application is configured to determine if locations selected by a pointing device 
during playback of the video content correspond to said predetermined pixel objects and 
provide a link to a data object when said selected location corresponds to one of said 
determined pixel objects. (Col 7 lines 35-52) It should be noted that the point device as 
taught by Reagan is a mouse. 

Claims 4-5 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Rangan (6198833) in view of Feinleib (6637032) in further view of Courtney (6424370) 
and Toklu (6549643). 

Regarding claim 4, Rangan, Feinleib and Courtney do not explicitly teach said 
video linking system is configured to identify segment breaks in said video content. 
This is what Toklu teaches. Toklu teaches video summarization methods typically 
include segmenting a video into an appropriate set of segments such as video "shots" 
and selecting one or more key-frames from the shots. (Col 1 , lines 34-37) It should be 
noted that a key-frame is defined in the art to be a frame used to indicate the beginning 
or end of a change made to the signal and therefore, an implied segment break. It 
would have been obvious to one of ordinary skill in the art at the present time the 
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invention was made to combine video summarization methods configured to identify 
segment breaks as taught by Toklu with the image processing system as taught by 
Rangan in order to reduce the number of images to one or more key-frames to 
represent the content of a given shot (Col 1 , lines 43-45) and thus, to generate a video 
summary. (Col 1, line 33). 

Regarding claim 5, Rangan, Feinleib and Courtney do not explicitly teach said 
segment breaks are determined by determining the median average pixel values for a 
series of frames and comparing changes in the pixel values relative to the median 
average and indicating a segment break when the change in pixel values represents at 
least a predetermined change relative to the median average. This is what Toklu 
teaches. Toklu teaches determining median average pixel values for a series of 
frames by showing computing an average of an absolute pixel-based intensity 
difference between consecutive frames in each segment, and for each segment, 
computing a cumulative sum of the average of the absolute pixel-based intensity 
differences for the corresponding frames of the segment. (Col 3, lines 61-67) Toklu 
also teaches comparing changes in pixel values relative to median average by 
explaining selecting the first frame in each motion activity segment of a given segment 
frame if the cumulative sum of the average of the absolute pixel-based intensity 
differences for the frames of the given segment does not exceed a first predefined 
threshold. (Col 4, lines 1-5) Lastly, Toklu teaches indicating a segment break when the 
change in pixel values represents at least a predetermined change relative to the 
median average by showing selecting a predefined number of key-frames in the given 
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segment uniformly, if the cumulative sum of the average of the absolute pixel-based 
intensity differences for the frames of the given segment exceeds the first predefined 
threshold. (Col 4, lines 5-9) It should be noted that a key-frame is defined in the art to 
be a frame used to indicate the beginning or end of a change made to the signal and 
therefore an implied segment break. It would have been obvious to one of ordinary 
skill in the art at the present time the invention was made to combine determining the 
average pixel values for a series of frames, comparing changes in pixel values relative 
to the average and indicating a segment break when the change in pixel values 
represents at least a predetermined change relative to the median average as taught 
by Toklu with the image processing system as taught by Rangan in order to measure a 
temporal activity curve for dissimilarity based on frame differences. (Col 3, lines 60-62) 
and thus, make possible in the system and method for selecting key-frames from video 
data. (Col 3, lines 51-59) 

Claims 18 and 28 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Rangan (6198833) in view of Feinleib (6637032) in further view of Courtney 
(6424370) and Toyama (5204749). 

Regarding claims 18 and 28, neither Rangan nor Feinleib explicitly teaches 
automatically determining changes in the characteristics of said one or more pixel 
objects based on upon changes in lighting and automatically compensating based upon 
those changes: This is what Toyama teaches. (Col 3 lines 59-62, Col 13 line 40-Col 14 
line 50, Fig. 9) It should be noted that Toyama teaches automatically detecting changes 
in the follow-up field of the object (automatically shifting color coordinate plane of values 
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(R-Y/Y) and (B-Y/Y) from points AO, BO, CO to A1 , B1 , C1 [Fig. 9]). Furthermore it 
should be noted that Toyama teaches said changes in the color difference signals are 
based on (accounting for) changes in lighting and also permits stable follow-up by 
automatically compensating for variations in luminance of the illuminating light. It would 
have been obvious to one of ordinary skill in the art at the time the invention was made 
to combine the teachings of automatically determining changes of one or more pixel 
objects based upon changes in lighting and automatically compensating for those 
changes into the system of Rangan because prevention of each of the points on the 
coordinate system from coming close to the origin or moving farther way from the origin 
(due to luminance of light varying with time) while the object is not moving can be 
realized (Col 15 lines 48-54) and thus, stably performing a follow-up operation in despite 
of variations in luminance of illuminating light (Col 3 lines 59-62) can be achieved. 

Conclusion 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Kevin K. Xu whose telephone number is 571-272-7747. 
The examiner can normally be reached on 8:30AM - 5:00 PM. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
v supervisor, Mark Zimmerman can be reached on 571-272-7653. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 




Kevin Xu 
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